This presents the renewed analysis of Cryptococcus neoformans start codon usage and context. This uses the best-transcript annotation and corresponding start codon position and sequence map made by Corinne Maufrais in June 2018.
It covers both JEC21 and H99 data. First several analyses on JEC21, then the same analyses on H99, then a joint analysis of signals conserved across both strains.
We check consensus sequences for both “narrow” (-4 NNNNATG) and “wide” (-10 NNNNNNNNNNATG) neighbourhoods of the start codon, and find essentially the same results with both, comparing annotated aATGs to downstream dATGs. Then for the following analyses we use mostly the wide score.
Generalized additive model smooths use thin plate regression spline with k=4 basis dimension, from the mgcv package (Wood S.N. (2017) Generalized Additive Models).
## # A tibble: 6,634 x 4
## # Groups: Gene [6,634]
## Gene RNA RPF TE
## <chr> <dbl> <dbl> <dbl>
## 1 CNM01300 3981. 18260. 4.59
## 2 CNM01080 8422. 8957. 1.06
## 3 CNA07570 5811. 7134. 1.23
## 4 CNG04360 3171. 7048. 2.22
## 5 CNB02360 3764. 6958. 1.85
## 6 CNA06350 15019. 6591. 0.439
## 7 CNC00700 2344. 6224. 2.65
## 8 CNF03840 11321. 6175. 0.545
## 9 CNF02150 15611. 6144. 0.394
## 10 CNF03160 5094. 6114. 1.20
## # ... with 6,624 more rows
We also calculated hiTrans_JEC21, the top 5% (330) translated genes by RPF TPM.
## # A tibble: 6,639 x 19
## Gene aATG.context aATG.pos d1.context d1.posTSS d1.posATG d1.frame
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 CNA0… GACCCCCTTGT… 93 ATAGCTGGT… 226 133 1
## 2 CNA0… ATATTGCCTGA… 102 GTCCACCTT… 163 61 1
## 3 CNA0… GAACTATCAAG… 214 GAGGCTCCG… 512 298 1
## 4 CNA0… ATTTTCAACAG… 81 AGCAATATA… 307 226 1
## 5 CNA0… ACCGTGCACAC… 76 GTATTCGGG… 106 30 0
## 6 CNA0… AATCATACCAA… 117 GCCCCTATC… 186 69 0
## 7 CNA0… CCGACTATAAA… 52 AACCGTGCT… 112 60 0
## 8 CNA0… CTTTCTCTTCA… 77 TGCTATAGC… 98 21 0
## 9 CNA0… TAATCACACAA… 330 CTCATCATC… 391 61 1
## 10 CNA0… AAAAAAAACGC… 146 ACTTGTCGA… 184 38 2
## # ... with 6,629 more rows, and 12 more variables: d2.context <chr>,
## # d2.posTSS <dbl>, d2.posATG <dbl>, d2.frame <dbl>, u1.context <chr>,
## # u1.posTSS <dbl>, u1.posATG <dbl>, u1.frame <dbl>, u2.context <chr>,
## # u2.posTSS <dbl>, u2.posATG <dbl>, u2.frame <dbl>
That’s for hiTrans_JEC21, the top 5% (330) translated genes by RPF TPM.
Venn diagram
First upstream ATG.
First downstream ATG
Except for 3rd-codon-position bias.
Calculate motif score against the position weight matrix (pwm) for both narrow (-4 from ATG through to ATG) and wide (-10 from ATG to ATG) kozak consensus motif. These motifs are taken from the top 5% highly translated genes.
Using the sequence logo, details on https://en.wikipedia.org/wiki/Sequence_logo.
This is equal to the total height of the letters in the sequence logo summed across multiple positions.
## # A tibble: 6 x 4
## Genes ATG Width Infon
## <chr> <chr> <chr> <dbl>
## 1 All aATG narrow 1.03
## 2 HiTrans aATG narrow 2.84
## 3 CytoRibo aATG narrow 4.01
## 4 All d1ATG narrow 0.131
## 5 HiTrans d1ATG narrow 0.307
## 6 CytoRibo d1ATG narrow 0.517
Information content in bits of highly-translated consensus (excluding 6 bits from ATG), narrow is 2.84, of wide is 3.81.
This is equal to the total height of the letters in the sequence logo at each position. It could be useful in comparing the total information without getting overly distracted by the actual letters.
We calculate scores using Biostrings::PWMscoreStartingAt.
The best description I could find of this method is: https://support.bioconductor.org/p/61520/
It is just the sum of the matrix product of the PWM with the sequence.
Write scores to file scores_kozak_JEC21.txt.
## # A tibble: 6,639 x 13
## Gene aATG.scorekn d1.scorekn d2.scorekn u1.scorekn aATG.scorekw
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNA0… 0.725 0.756 0.899 0.966 0.744
## 2 CNA0… 0.805 0.932 0.727 0.862 0.786
## 3 CNA0… 0.872 0.683 0.823 0.805 0.885
## 4 CNA0… 0.895 0.795 0.837 NA 0.858
## 5 CNA0… 0.964 0.741 0.839 NA 0.866
## 6 CNA0… 1. 0.884 0.698 NA 0.943
## 7 CNA0… 0.977 0.803 0.763 NA 0.919
## 8 CNA0… 0.781 0.964 0.758 0.733 0.769
## 9 CNA0… 0.940 0.920 0.803 NA 0.929
## 10 CNA0… 0.851 0.816 0.781 NA 0.862
## # ... with 6,629 more rows, and 7 more variables: d1.scorekw <dbl>,
## # d2.scorekw <dbl>, u1.scorekw <dbl>, d1vsan <dbl>, u1vsan <dbl>,
## # d1vsaw <dbl>, u1vsaw <dbl>
This illustrates that the MI between pairs of nts is generally weak. Except for the nts sharing a codon in the +4 to +12 positions. And secondarily at the -6 to -4 positions.
So there is a strong tendency to have a C at -5 if there is a T at -6.
This is very interesting: essentially everything with a -4A has a -2A and -1A, but a -4C is more relaxed.
This looks interesting and needs a better way of summarizing.
Red: high dATG vs aATG Kozak score. Blue: highly translated. Purple: both.
Rnarrow = -0.07; Rwide = -0.054
Boxplots show enough_RNA only.
The narrow score genes are in this list:
## # A tibble: 330 x 3
## Gene aATG.scorekn d1.scorekn
## <chr> <dbl> <dbl>
## 1 CNA01530 0.673 1.
## 2 CNB00520 0.683 0.989
## 3 CNI00670 0.698 1.
## 4 CNK00900 0.707 1.
## 5 CND02465 0.725 1.
## 6 CNI00690 0.673 0.945
## 7 CNG04505 0.730 1.
## 8 CNA00760 0.733 1.
## 9 CNI00340 0.706 0.966
## 10 CNB04570 0.730 0.989
## # ... with 320 more rows
Saved to file dvsaATG_highdiffw_enoughRNA_JEC21.txt.
## # A tibble: 149 x 5
## Gene aATG.scorekw d1.scorekw d1.posATG d1.frame
## <chr> <dbl> <dbl> <dbl> <fct>
## 1 CNI00340 0.665 0.953 58 Out
## 2 CNI00690 0.631 0.913 33 In
## 3 CNA04990 0.669 0.939 102 In
## 4 CNF02520 0.665 0.925 111 In
## 5 CNB01775 0.661 0.912 21 In
## 6 CNC07140 0.664 0.912 250 Out
## 7 CNG01740 0.717 0.963 18 In
## 8 CNA05725 0.721 0.967 51 In
## 9 CNN00820 0.657 0.900 63 In
## 10 CNG01890 0.607 0.845 177 In
## # ... with 139 more rows
For top 3315 / 50% of genes by mean RNA TPM.
In input file JEC21_mitofates.txt.
## # A tibble: 16 x 5
## # Groups: enoughR, d1vsaw0p1, d1.framefac [?]
## enoughR d1vsaw0p1 d1.framefac Pred_preseq n
## <fct> <fct> <fct> <fct> <int>
## 1 Yes d1lo In No 773
## 2 Yes d1lo In Yes 123
## 3 Yes d1lo Out No 2032
## 4 Yes d1lo Out Yes 219
## 5 Yes d1hi In No 52
## 6 Yes d1hi In Yes 37
## 7 Yes d1hi Out No 50
## 8 Yes d1hi Out Yes 10
## 9 No d1lo In No 826
## 10 No d1lo In Yes 62
## 11 No d1lo Out No 2125
## 12 No d1lo Out Yes 89
## 13 No d1hi In No 101
## 14 No d1hi In Yes 13
## 15 No d1hi Out No 96
## 16 No d1hi Out Yes 6
It’s just a subset: the dual-localized ones.
##
## Call:
## lm(formula = log10(TE) ~ uATGCtC, data = ribo_uct_JEC21)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.66605 -0.23280 0.01163 0.26156 1.31559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.063006 0.008206 -7.678 2.12e-14 ***
## uATGCtC1 -0.105074 0.016795 -6.256 4.45e-10 ***
## uATGCtC2+ -0.336937 0.020395 -16.521 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3849 on 3312 degrees of freedom
## Multiple R-squared: 0.07855, Adjusted R-squared: 0.078
## F-statistic: 141.2 on 2 and 3312 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log10(TE) ~ uATGCt, data = ribo_uct_JEC21)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.38972 -0.23666 0.01207 0.25541 1.57982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.087294 0.007194 -12.13 <2e-16 ***
## uATGCt -0.054908 0.003475 -15.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3866 on 3313 degrees of freedom
## Multiple R-squared: 0.0701, Adjusted R-squared: 0.06982
## F-statistic: 249.7 on 1 and 3313 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log10(TE) ~ aATG.pos, data = ribo_uct_JEC21)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.79204 -0.23817 0.01468 0.25840 1.50324
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0732841 0.0089969 -8.145 5.29e-16 ***
## aATG.pos -0.0004104 0.0000435 -9.435 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3956 on 3313 degrees of freedom
## Multiple R-squared: 0.02617, Adjusted R-squared: 0.02587
## F-statistic: 89.01 on 1 and 3313 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log10(TE) ~ uATGCtC + aATG.pos, data = ribo_uct_JEC21)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.63729 -0.23614 0.01226 0.26244 1.29497
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.760e-02 9.458e-03 -5.033 5.09e-07 ***
## uATGCtC1 -1.005e-01 1.683e-02 -5.973 2.57e-09 ***
## uATGCtC2+ -3.073e-01 2.230e-02 -13.777 < 2e-16 ***
## aATG.pos -1.510e-04 4.628e-05 -3.263 0.00111 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3844 on 3311 degrees of freedom
## Multiple R-squared: 0.08151, Adjusted R-squared: 0.08067
## F-statistic: 97.94 on 3 and 3311 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log10(TE) ~ uATGCt + aATG.pos, data = ribo_uct_JEC21)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.39741 -0.23732 0.01119 0.25620 1.53043
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.021e-01 9.067e-03 -11.260 < 2e-16 ***
## uATGCt -6.476e-02 5.057e-03 -12.806 < 2e-16 ***
## aATG.pos 1.657e-04 6.186e-05 2.678 0.00744 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3863 on 3312 degrees of freedom
## Multiple R-squared: 0.07211, Adjusted R-squared: 0.07155
## F-statistic: 128.7 on 2 and 3312 DF, p-value: < 2.2e-16
We suspect that uATG is associated with lower TE if the uATG has
This figure shows that, for genes with only 1 uATG, this correlation is weak.
## # A tibble: 2 x 6
## u1.posTSS20 R r.squared p.value R2label plabel
## <fct> <dbl> <dbl> <dbl> <chr> <chr>
## 1 uAUG > 20nt 0.0648 0.00419 0.275 R^2 == 0.0042 p == 0.28
## 2 uAUG ≤ 20nt 0.00648 0.0000420 0.897 R^2 == 4.2e-05 p == 0.9
Check these for ribosome occupancy at uATG.
## # A tibble: 7 x 8
## # Groups: Gene [7]
## Gene RNA RPF TE uATGCt uATGCtmin20 u1.cxtn u2.cxtn
## <chr> <dbl> <dbl> <dbl> <int> <int> <chr> <chr>
## 1 CNA07610 50.0 5.85 0.117 1 1 TCCGTATG <NA>
## 2 CNF00330 182. 3.20 0.0176 8 8 AAAAAATG CAAAAATG
## 3 CNG00290 58.3 4.38 0.0751 1 1 GCAGGATG <NA>
## 4 CNG04240 123. 5.50 0.0446 0 0 <NA> <NA>
## 5 CNH02210 42.4 1.68 0.0396 1 1 CCACAATG <NA>
## 6 CNL04930 203. 17.3 0.0853 2 2 CGACAATG ACTTTATG
## 7 CNM02470 171. 17.0 0.0993 2 2 CCAGAATG CCATCATG
For top 3315 / 50% of genes by mean RNA TPM.
For top 3315 / 50% of genes by mean RNA TPM, summarized by gene, both samples.
For top 3315 / 50% of genes by mean RNA TPM, with only a single uATG, summarized by gene, median across 4 samples.
## # A tibble: 2 x 6
## Type R r.squared p.value R2label plabel
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 RNA 0.115 0.0133 0.0276 R^2 == 0.013 p == 0.028
## 2 RPF 0.237 0.0562 0.0000122 R^2 == 0.056 p == 1.2e-05
## # A tibble: 6,790 x 4
## # Groups: Gene [6,790]
## Gene RNA RPF TE
## <chr> <dbl> <dbl> <dbl>
## 1 CNAG_06125 10270. 20140. 1.96
## 2 CNAG_06101 8775. 8494. 0.968
## 3 CNAG_05762 7529. 7499. 0.996
## 4 CNAG_00779 3896. 7432. 1.91
## 5 CNAG_03127 6254. 7164. 1.15
## 6 CNAG_06222 6631. 6772. 1.02
## 7 CNAG_04011 13306. 6772. 0.509
## 8 CNAG_01455 12670. 6548. 0.517
## 9 CNAG_05525 6970. 6461. 0.927
## 10 CNAG_03739 6515. 6383. 0.980
## # ... with 6,780 more rows
We also calculated hiTrans_H99, the top 5% (330) translated genes by RPF TPM.
## # A tibble: 6,794 x 19
## Gene aATG.context aATG.pos d1.context d1.posTSS d1.posATG d1.frame
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 CNAG… TACTTACGCGA… 70 AAATTCACT… 100 30 0
## 2 CNAG… GAACTTCGATC… 52 TCTCCCGCC… 114 62 2
## 3 CNAG… TGTCTCCTTGA… 104 ACTTACGCC… 189 85 1
## 4 CNAG… CACATACGTAA… 214 CCGAACGGC… 256 42 0
## 5 CNAG… GACTATACAAA… 55 GGAGGTGGG… 163 108 0
## 6 CNAG… AACCATACAAA… 99 CAAAGCCAT… 259 160 1
## 7 CNAG… ACCGTGCACAC… 75 GTATTCGGA… 105 30 0
## 8 CNAG… GTTTTCAACAG… 73 CCCATCAGA… 380 307 1
## 9 CNAG… GTACTATTGAA… 206 GAGGCTCCG… 513 307 1
## 10 CNAG… TACAAGCTTGA… 90 GGCCGCCTT… 151 61 1
## # ... with 6,784 more rows, and 12 more variables: d2.context <chr>,
## # d2.posTSS <dbl>, d2.posATG <dbl>, d2.frame <dbl>, u1.context <chr>,
## # u1.posTSS <dbl>, u1.posATG <dbl>, u1.frame <dbl>, u2.context <chr>,
## # u2.posTSS <dbl>, u2.posATG <dbl>, u2.frame <dbl>
That’s for hiTrans_H99, the top 5% (330) translated genes by RPF TPM.
Ideally would fix this more nicely.
Venn diagram
First upstream ATG.
First downstream ATG
Except for 3rd-codon-position bias.
Calculate motif score against the position weight matrix (pwm) for both narrow (-4 from ATG through to ATG) and wide (-10 from ATG to ATG) kozak consensus motif. These motifs are taken from the top 5% highly translated genes.
Using the sequence logo, details on https://en.wikipedia.org/wiki/Sequence_logo.
This is equal to the total height of the letters in the sequence logo summed across multiple positions.
## # A tibble: 6 x 4
## Genes ATG Width Infon
## <chr> <chr> <chr> <dbl>
## 1 All aATG narrow 0.943
## 2 HiTrans aATG narrow 2.96
## 3 CytoRibo aATG narrow 4.16
## 4 All d1ATG narrow 0.116
## 5 HiTrans d1ATG narrow 0.233
## 6 CytoRibo d1ATG narrow 0.492
Information content in bits of highly-translated consensus (excluding 6 bits from ATG), narrow is 2.96, of wide is 3.88.
Write scores to file scores_kozak_H99.txt.
## # A tibble: 6,794 x 13
## Gene aATG.scorekn d1.scorekn d2.scorekn u1.scorekn aATG.scorekw
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG… 0.904 0.776 0.747 0.847 0.849
## 2 CNAG… 0.871 0.841 1 NA 0.817
## 3 CNAG… 0.786 0.849 0.720 NA 0.769
## 4 CNAG… 0.920 0.785 0.788 0.849 0.866
## 5 CNAG… 0.978 0.759 0.822 NA 0.929
## 6 CNAG… 0.978 0.704 0.763 NA 0.928
## 7 CNAG… 0.966 0.856 0.849 NA 0.873
## 8 CNAG… 0.878 0.834 0.978 NA 0.851
## 9 CNAG… 0.891 0.787 0.821 0.821 0.900
## 10 CNAG… 0.874 0.935 0.662 0.678 0.825
## # ... with 6,784 more rows, and 7 more variables: d1.scorekw <dbl>,
## # d2.scorekw <dbl>, u1.scorekw <dbl>, d1vsan <dbl>, u1vsan <dbl>,
## # d1vsaw <dbl>, u1vsaw <dbl>
## # A tibble: 1 x 4
## aATG.scorekw d1.scorekw d2.scorekw u1.scorekw
## <dbl> <dbl> <dbl> <dbl>
## 1 0.856 0.787 0.777 0.780
## # A tibble: 1 x 4
## aATG.scorekw d1.scorekw d2.scorekw u1.scorekw
## <dbl> <dbl> <dbl> <dbl>
## 1 0.926 0.782 0.775 0.755
## # A tibble: 4 x 3
## dvsmeda dvsmedhiTrans n
## <lgl> <lgl> <int>
## 1 FALSE FALSE 5667
## 2 TRUE FALSE 978
## 3 TRUE TRUE 131
## 4 NA NA 18
This illustrates that the MI between pairs of nts is generally weak. Except for the nts sharing a codon in the +4 to +12 positions. And secondarily at the -6 to -4 positions.
So there is a strong tendency to have a C at -5 if there is a T at -6.
This is uninteresting because the counts are so low with no -4A. The -3A is even worse.
Filtered for enough RNA!
Rnarrow = -0.058; Rwide = -0.036
Boxplots show enough_RNA only.
The narrow score genes are in this list:
## # A tibble: 330 x 3
## Gene aATG.scorekn d1.scorekn
## <chr> <dbl> <dbl>
## 1 CNAG_04764 0.633 0.990
## 2 CNAG_04147 0.636 0.990
## 3 CNAG_00165 0.662 1
## 4 CNAG_07473 0.633 0.968
## 5 CNAG_07801 0.662 0.978
## 6 CNAG_02259 0.662 0.978
## 7 CNAG_07776 0.688 1
## 8 CNAG_06751 0.691 0.990
## 9 CNAG_04179 0.696 0.990
## 10 CNAG_01092 0.675 0.966
## # ... with 320 more rows
Files with high difference in narrow score, filtered for reasonable amounts of RNA, in frame. Saved to dvsaATG_highdiffw_enoughRNA_H99.txt.
## # A tibble: 167 x 5
## Gene aATG.scorekw d1.scorekw d1.posATG d1.frame
## <chr> <dbl> <dbl> <dbl> <fct>
## 1 CNAG_04147 0.611 0.929 57 In
## 2 CNAG_02259 0.641 0.953 87 In
## 3 CNAG_04764 0.616 0.905 30 In
## 4 CNAG_07801 0.682 0.945 57 In
## 5 CNAG_04179 0.691 0.947 99 In
## 6 CNAG_07776 0.712 0.967 36 In
## 7 CNAG_07473 0.639 0.893 87 In
## 8 CNAG_03953 0.687 0.940 72 In
## 9 CNAG_03486 0.700 0.951 30 In
## 10 CNAG_05722 0.652 0.901 111 In
## # ... with 157 more rows
For top 3315 / 50% of genes by mean RNA TPM.
## # A tibble: 4 x 3
## # Groups: Type [2]
## Type d1vsaScut R
## <chr> <fct> <dbl>
## 1 RNA (-1,0] 0.113
## 2 RNA (0,1] 0.201
## 3 RPF (-1,0] 0.0127
## 4 RPF (0,1] 0.307
In input file H99_mitofates.txt.
## # A tibble: 16 x 5
## # Groups: enoughR, d1vsaw0p1, d1.framefac [?]
## enoughR d1vsaw0p1 d1.framefac Pred_preseq n
## <fct> <fct> <fct> <fct> <int>
## 1 Yes d1lo In No 752
## 2 Yes d1lo In Yes 120
## 3 Yes d1lo Out No 2018
## 4 Yes d1lo Out Yes 241
## 5 Yes d1hi In No 69
## 6 Yes d1hi In Yes 41
## 7 Yes d1hi Out No 53
## 8 Yes d1hi Out Yes 4
## 9 No d1lo In No 906
## 10 No d1lo In Yes 55
## 11 No d1lo Out No 2222
## 12 No d1lo Out Yes 81
## 13 No d1hi In No 98
## 14 No d1hi In Yes 10
## 15 No d1hi Out No 91
## 16 No d1hi Out Yes 7
## # A tibble: 4 x 7
## # Groups: enoughRNA [?]
## enoughRNA d1vsawfac n d1pos.med d1pos.mean d2pos.med d2pos.mean
## <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 enough RNA "AUG score\n… 3131 79 98.6 146 165.
## 2 enough RNA "AUG score\n… 167 66 79.8 152 164.
## 3 not enough "AUG score\n… 3269 73 91.1 137 157.
## 4 not enough "AUG score\n… 208 51 75.6 120. 140.
## # A tibble: 8 x 8
## # Groups: enoughRNA, d1vsawfac [?]
## enoughRNA d1vsawfac d1.framefac n d1pos.med d1pos.mean d2pos.med
## <fct> <fct> <fct> <int> <dbl> <dbl> <dbl>
## 1 enough R… "AUG sco… In 872 57 81.6 130
## 2 enough R… "AUG sco… Out 2259 86 105. 151
## 3 enough R… "AUG sco… In 110 69 79.3 155
## 4 enough R… "AUG sco… Out 57 61 80.7 141
## 5 not enou… "AUG sco… In 962 55.5 79.4 130
## 6 not enou… "AUG sco… Out 2307 80 96.0 140
## 7 not enou… "AUG sco… In 108 45 73.1 120.
## 8 not enou… "AUG sco… Out 100 61.5 78.2 120.
## # ... with 1 more variable: d2pos.mean <dbl>
It’s just a subset: the dual-localized ones.
In input file H99_DeepLoc.txt.
In input file H99_SignalP.txt.
## # A tibble: 15 x 5
## # Groups: enoughRNA, d1vsawfac, d1.framefac [?]
## enoughRNA d1vsawfac d1.framefac SignalP n
## <fct> <fct> <fct> <chr> <int>
## 1 enough RNA "AUG score\nd < a + 0.1" In N 836
## 2 enough RNA "AUG score\nd < a + 0.1" In Y 36
## 3 enough RNA "AUG score\nd < a + 0.1" Out N 2148
## 4 enough RNA "AUG score\nd < a + 0.1" Out Y 111
## 5 enough RNA "AUG score\na + 0.1 < d" In N 107
## 6 enough RNA "AUG score\na + 0.1 < d" In Y 3
## 7 enough RNA "AUG score\na + 0.1 < d" Out N 57
## 8 not enough "AUG score\nd < a + 0.1" In N 909
## 9 not enough "AUG score\nd < a + 0.1" In Y 52
## 10 not enough "AUG score\nd < a + 0.1" Out N 2163
## 11 not enough "AUG score\nd < a + 0.1" Out Y 141
## 12 not enough "AUG score\na + 0.1 < d" In N 105
## 13 not enough "AUG score\na + 0.1 < d" In Y 3
## 14 not enough "AUG score\na + 0.1 < d" Out N 96
## 15 not enough "AUG score\na + 0.1 < d" Out Y 2
##
## Call:
## lm(formula = log10(TE) ~ uATGCtC, data = ribo_uct_H99)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.96753 -0.15177 0.00575 0.16272 0.82719
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.028117 0.005544 -5.072 4.16e-07 ***
## uATGCtC1 -0.085826 0.011755 -7.301 3.56e-13 ***
## uATGCtC2+ -0.239834 0.014692 -16.324 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2649 on 3312 degrees of freedom
## Multiple R-squared: 0.0796, Adjusted R-squared: 0.07904
## F-statistic: 143.2 on 2 and 3312 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log10(TE) ~ uATGCt, data = ribo_uct_H99)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.75308 -0.15226 0.00529 0.16283 1.30027
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.050872 0.004916 -10.35 <2e-16 ***
## uATGCt -0.034529 0.002496 -13.83 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2684 on 3313 degrees of freedom
## Multiple R-squared: 0.05459, Adjusted R-squared: 0.05431
## F-statistic: 191.3 on 1 and 3313 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log10(TE) ~ aATG.pos, data = ribo_uct_H99)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.89094 -0.15541 0.00616 0.16553 0.90609
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.671e-02 6.540e-03 -4.084 4.53e-05 ***
## aATG.pos -3.599e-04 3.561e-05 -10.107 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2719 on 3313 degrees of freedom
## Multiple R-squared: 0.02991, Adjusted R-squared: 0.02962
## F-statistic: 102.1 on 1 and 3313 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log10(TE) ~ uATGCtC + aATG.pos, data = ribo_uct_H99)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.87039 -0.15303 0.00436 0.16525 0.84498
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.223e-02 6.708e-03 -1.823 0.0684 .
## uATGCtC1 -8.003e-02 1.181e-02 -6.778 1.44e-11 ***
## uATGCtC2+ -2.139e-01 1.591e-02 -13.441 < 2e-16 ***
## aATG.pos -1.574e-04 3.761e-05 -4.185 2.93e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2642 on 3311 degrees of freedom
## Multiple R-squared: 0.08444, Adjusted R-squared: 0.08361
## F-statistic: 101.8 on 3 and 3311 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log10(TE) ~ uATGCt + aATG.pos, data = ribo_uct_H99)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.74936 -0.15208 0.00529 0.16258 1.30271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.939e-02 6.902e-03 -7.156 1.02e-12 ***
## uATGCt -3.373e-02 3.625e-03 -9.304 < 2e-16 ***
## aATG.pos -1.559e-05 5.105e-05 -0.305 0.76
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2685 on 3312 degrees of freedom
## Multiple R-squared: 0.05462, Adjusted R-squared: 0.05405
## F-statistic: 95.68 on 2 and 3312 DF, p-value: < 2.2e-16
We suspect that uATG is associated with lower TE if the uATG has
This figure shows that, for genes with only 1 uATG, this correlation is weak.
## # A tibble: 2 x 6
## u1.posTSS20 R r.squared p.value R2label plabel
## <fct> <dbl> <dbl> <dbl> <chr> <chr>
## 1 uAUG ≤ 20nt 0.0415 0.00172 0.415 R^2 == 0.0017 p == 0.41
## 2 uAUG > 20nt 0.103 0.0105 0.0954 R^2 == 0.011 p == 0.095
Many of these (CNAG_03140, CNAG_07695, CNAG_06246) are strongly translationally repressed and have good context at the uATG.
## # A tibble: 8 x 8
## # Groups: Gene [8]
## Gene RNA RPF TE uATGCt uATGCtmin20 u1.cxtn u2.cxtn
## <chr> <dbl> <dbl> <dbl> <int> <int> <chr> <chr>
## 1 CNAG_00784 52.8 6.69 0.127 1 0 TCCGTATG <NA>
## 2 CNAG_01709 155. 176. 1.13 1 1 AGTTCATG <NA>
## 3 CNAG_03140 187. 2.00 0.0107 6 6 GGAAAATG GACAAATG
## 4 CNAG_03578 43.5 6.20 0.143 1 1 GCAGGATG <NA>
## 5 CNAG_05574 30.0 2.66 0.0885 1 1 CCACAATG <NA>
## 6 CNAG_06246 196. 24.6 0.125 2 2 CCAGAATG CCATCATG
## 7 CNAG_07695 164. 5.69 0.0346 16 16 CAAAAATG CAAGAATG
## 8 CNAG_07813 148. 20.1 0.136 1 1 CGGCAATG <NA>
## # A tibble: 8 x 4
## Gene asw u1sw u2sw
## <chr> <dbl> <dbl> <dbl>
## 1 CNAG_00784 0.747 0.703 NA
## 2 CNAG_01709 0.944 0.722 NA
## 3 CNAG_03140 0.836 0.851 0.758
## 4 CNAG_03578 0.680 0.824 NA
## 5 CNAG_05574 0.781 0.872 NA
## 6 CNAG_06246 0.776 0.933 0.864
## 7 CNAG_07695 0.690 0.967 0.896
## 8 CNAG_07813 0.862 0.792 NA
For top 3315 / 50% of genes by mean RNA TPM.
For top 3315 / 50% of genes by mean RNA TPM, summarized by gene, all 4 samples.
For top 3315 / 50% of genes by mean RNA TPM, with only a single uATG, summarized by gene, median across 4 samples.
## # A tibble: 2 x 6
## Type R r.squared p.value R2label plabel
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 RNA 0.187 0.0351 0.000822 R^2 == 0.035 p == 8.2e-04
## 2 RPF 0.275 0.0755 0.000000466 R^2 == 0.076 p == 4.7e-07
From 2016 Paper.
## # A tibble: 6,341 x 2
## H99 JEC21
## <chr> <chr>
## 1 CNAG_01397 CND05080
## 2 CNAG_07825 CNH03545
## 3 CNAG_05539 CNH01890
## 4 CNAG_03635 CNB01365
## 5 CNAG_06621 CNF03970
## 6 CNAG_00830 CNA08090
## 7 CNAG_07556 CNK01100
## 8 CNAG_06796 CNB00060
## 9 CNAG_06009 CNM00180
## 10 CNAG_03522 CNG00710
## # ... with 6,331 more rows
## # A tibble: 20 x 8
## H99 JEC21 RNA.H99 RPF.H99 TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_06125 CNM01300 10270. 20140. 1.96 3981. 18260. 4.59
## 2 CNAG_06101 CNM01080 8775. 8494. 0.968 8422. 8957. 1.06
## 3 CNAG_00779 CNA07570 3896. 7432. 1.91 5811. 7134. 1.23
## 4 CNAG_03127 CNG04360 6254. 7164. 1.15 3171. 7048. 2.22
## 5 CNAG_05762 CNF02150 7529. 7499. 0.996 15611. 6144. 0.394
## 6 CNAG_03739 CNB02360 6515. 6383. 0.980 3764. 6958. 1.85
## 7 CNAG_06222 CNM02240 6631. 6772. 1.02 4689. 6013. 1.28
## 8 CNAG_00655 CNA06350 12483. 6041. 0.484 15019. 6591. 0.439
## 9 CNAG_04011 CNB04930 13306. 6772. 0.509 19847. 5650. 0.285
## 10 CNAG_06633 CNF03840 9267. 6136. 0.662 11321. 6175. 0.545
## 11 CNAG_01332 CND04480 5923. 6076. 1.03 4638. 5976. 1.29
## 12 CNAG_03015 CNC00700 4856. 5691. 1.17 2344. 6224. 2.65
## 13 CNAG_04448 CNI01090 6654. 5950. 0.894 5345. 5891. 1.10
## 14 CNAG_00640 CNA06200 7701. 5784. 0.751 4883. 6037. 1.24
## 15 CNAG_00771 CNA07490 6955. 5860. 0.843 7040. 5959. 0.846
## 16 CNAG_04883 CNJ03110 4270. 5872. 1.38 5747. 5908. 1.03
## 17 CNAG_04726 CNJ01560 8038. 6353. 0.790 6386. 5396. 0.845
## 18 CNAG_00672 CNA06500 9191. 6054. 0.659 14270. 5645. 0.396
## 19 CNAG_05525 CNH01770 6970. 6461. 0.927 4051. 5204. 1.28
## 20 CNAG_03780 CNB02750 6868. 5685. 0.828 5180. 5878. 1.13
## # A tibble: 20 x 8
## H99 JEC21 RNA.H99 RPF.H99 TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_01130 CND02530 47.7 301. 6.30 32.7 297. 9.09
## 2 CNAG_01890 CNK02310 248. 1445. 5.84 279. 1901. 6.82
## 3 CNAG_06150 CNM01520 607. 3347. 5.51 539. 2928. 5.43
## 4 CNAG_02994 CNC06020 68.5 261. 3.81 31.1 218. 7.01
## 5 CNAG_01750 CNC02520 256. 1307. 5.10 311. 1655. 5.33
## 6 CNAG_01727 CNC02320 736. 3570. 4.85 706. 3747. 5.31
## 7 CNAG_01744 CNC02470 104. 315. 3.04 33.5 237. 7.07
## 8 CNAG_04327 CNI02220 44.7 201. 4.51 35.2 197. 5.58
## 9 CNAG_01117 CND02420 436. 2095. 4.81 439. 2275. 5.18
## 10 CNAG_05907 CNF00650 94.5 298. 3.15 66.3 450. 6.79
## 11 CNAG_04640 CNJ00800 217. 822. 3.80 159. 969. 6.10
## 12 CNAG_04313 CNI02360 204. 225. 1.10 30.8 254. 8.23
## 13 CNAG_07373 CNA06000 65.7 302. 4.59 77.8 354. 4.55
## 14 CNAG_05602 CNH02450 381. 675. 1.77 27.4 192. 7.00
## 15 CNAG_06840 CND06220 1196. 2878. 2.41 460. 2895. 6.29
## 16 CNAG_00136 CNA01230 46.4 197. 4.24 45.3 199. 4.40
## 17 CNAG_05884 CNF00890 79.8 294. 3.69 72.8 360. 4.95
## 18 CNAG_06208 CNM02070 251. 976. 3.89 231. 986. 4.27
## 19 CNAG_00992 CND01200 254. 889. 3.50 262. 1168. 4.47
## 20 CNAG_04659 CNJ00950 25.8 55.6 2.16 26.3 152. 5.80
To-do: Check which of these have uATGs.
## # A tibble: 20 x 8
## H99 JEC21 RNA.H99 RPF.H99 TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_07695 CNF00330 164. 5.69 0.0346 182. 3.20 0.0176
## 2 CNAG_03140 CNG04240 187. 2.00 0.0107 123. 5.50 0.0446
## 3 CNAG_05574 CNH02210 30.0 2.66 0.0885 42.4 1.68 0.0396
## 4 CNAG_04855 CNJ02770 30.4 2.67 0.0879 87.9 6.69 0.0761
## 5 CNAG_06614 CNF04050 41.8 4.39 0.105 57.8 4.57 0.0791
## 6 CNAG_02323 CNE02240 39.1 3.59 0.0918 49.9 5.12 0.103
## 7 CNAG_03578 CNG00290 43.5 6.20 0.143 58.3 4.38 0.0751
## 8 CNAG_07813 CNL04930 148. 20.1 0.136 203. 17.3 0.0853
## 9 CNAG_06246 CNM02470 196. 24.6 0.125 171. 17.0 0.0993
## 10 CNAG_05319 CNH03140 35.4 0.602 0.0170 35.4 7.69 0.218
## 11 CNAG_00784 CNA07610 52.8 6.69 0.127 50.0 5.85 0.117
## 12 CNAG_08027 CNH02090 25.6 2.75 0.107 89.8 13.6 0.152
## 13 CNAG_02433 CNE01240 38.9 6.78 0.174 114. 11.1 0.0970
## 14 CNAG_00529 CNA05110 40.2 7.86 0.196 96.7 7.83 0.0809
## 15 CNAG_05237 CNL03915 33.1 9.02 0.273 72.1 0.277 0.00384
## 16 CNAG_05288 CNH03430 56.4 9.02 0.160 70.0 8.83 0.126
## 17 CNAG_02867 CNC04820 54.5 7.14 0.131 47.6 7.51 0.158
## 18 CNAG_01624 CNC01375 34.5 5.07 0.147 30.4 4.36 0.143
## 19 CNAG_05567 CNH02150 36.1 6.85 0.189 75.9 7.98 0.105
## 20 CNAG_06782 CNB00170 53.7 8.43 0.157 59.8 9.21 0.154
We take transcripts where the overall gene expression (RNA abundance in top 50%), the difference in score (dATG > aATG in top 5%), and the dATG frame are all conserved between H99 and JEC21.
Saved to file dvsaATG_highdiffw_inframe_cc.txt.
## # A tibble: 47 x 6
## H99 JEC21 a.skw.H99 d.skw.H99 a.skw.JEC21 d.skw.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_05722 CNF02520 0.652 0.901 0.665 0.925
## 2 CNAG_00517 CNA04990 0.692 0.925 0.669 0.939
## 3 CNAG_02259 CNE02870 0.641 0.953 0.657 0.840
## 4 CNAG_06353 CNN00820 0.648 0.895 0.657 0.900
## 5 CNAG_07776 CNI00670 0.712 0.967 0.730 0.961
## 6 CNAG_03396 CNG01890 0.602 0.842 0.607 0.845
## 7 CNAG_04179 CNI03160 0.691 0.947 0.705 0.922
## 8 CNAG_07801 CNL06190 0.682 0.945 0.690 0.893
## 9 CNAG_00165 CNA01530 0.686 0.919 0.699 0.922
## 10 CNAG_03953 CNB04410 0.687 0.940 0.739 0.940
## 11 CNAG_01544 CNC06400 0.696 0.926 0.691 0.909
## 12 CNAG_02545 CNE00210 0.680 0.904 0.696 0.910
## 13 CNAG_07473 CNB01880 0.639 0.893 0.664 0.830
## 14 CNAG_04905 CNJ03280 0.795 0.989 0.786 0.991
## 15 CNAG_05900 CNF00730 0.775 0.975 0.785 0.975
## 16 CNAG_00589 CNA05725 0.743 0.888 0.721 0.967
## 17 CNAG_04219 CNI03610 0.748 0.949 0.758 0.945
## 18 CNAG_06713 CNB00900 0.749 0.962 0.754 0.929
## 19 CNAG_04017 CNB04990 0.750 0.935 0.743 0.940
## 20 CNAG_02880 CNC04930 0.676 0.877 0.692 0.867
## # ... with 27 more rows
Saved to file dvsaATG_highdiffw_outframe_cc.txt.
## # A tibble: 17 x 6
## H99 JEC21 a.skw.H99 d.skw.H99 a.skw.JEC21 d.skw.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_06278 CNN00160 0.693 0.931 0.721 0.930
## 2 CNAG_03370 CNG02120 0.738 0.941 0.770 0.945
## 3 CNAG_00973 CND01020 0.791 0.963 0.778 0.965
## 4 CNAG_03008 CNC06190 0.826 0.993 0.837 0.994
## 5 CNAG_04054 CNB05380 0.716 0.881 0.739 0.881
## 6 CNAG_03363 CNG02175 0.671 0.828 0.671 0.810
## 7 CNAG_07680 CNF00905 0.792 0.968 0.851 0.969
## 8 CNAG_02809 CNC04270 0.782 0.921 0.776 0.910
## 9 CNAG_02894 CNC05065 0.778 0.923 0.800 0.927
## 10 CNAG_04896 CNJ03200 0.797 0.923 0.797 0.924
## 11 CNAG_00453 CNA04340 0.751 0.893 0.782 0.893
## 12 CNAG_02578 CNK00690 0.802 0.910 0.781 0.909
## 13 CNAG_04281 CNI02680 0.813 0.922 0.750 0.874
## 14 CNAG_04959 CNL06560 0.667 0.796 0.707 0.810
## 15 CNAG_06093 CNM01000 0.829 0.935 0.814 0.933
## 16 CNAG_01667 CNC01780 0.826 0.930 0.829 0.937
## 17 CNAG_01098 CND02240 0.724 0.830 0.733 0.838
Filtered for enough RNA (top 50%)
## # A tibble: 117 x 6
## H99 JEC21 a.skw.H99 d.skw.H99 a.skw.JEC21 d.skw.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_07645 CNE03115 0.569 0.704 0.867 0.729
## 2 CNAG_04147 CNI02850 0.611 0.929 0.904 0.891
## 3 CNAG_02131 CNE04050 0.718 0.869 0.978 0.786
## 4 CNAG_03486 CNG01060 0.700 0.951 0.954 0.706
## 5 CNAG_02795 CNC04140 0.599 0.701 0.845 0.713
## 6 CNAG_01092 CND02180 0.632 0.880 0.875 0.831
## 7 CNAG_06000 CNM00090 0.681 0.858 0.917 0.748
## 8 CNAG_06196 CNM01950 0.726 0.580 0.948 0.708
## 9 CNAG_00690 CNA06680 0.672 0.828 0.888 0.666
## 10 CNAG_03269 CNG03080 0.659 0.865 0.867 0.725
## # ... with 107 more rows
## # A tibble: 117 x 6
## H99 JEC21 a.skw.H99 d.skw.H99 a.skw.JEC21 d.skw.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_04089 CNB05680 0.956 0.729 0.679 0.882
## 2 CNAG_04488 CNI00690 0.906 0.950 0.631 0.913
## 3 CNAG_04751 CNJ01820 0.905 0.818 0.666 0.858
## 4 CNAG_03968 CNB04570 0.956 0.771 0.721 0.954
## 5 CNAG_05504 CNH01580 0.934 0.737 0.710 0.815
## 6 CNAG_04362 CNI01880 0.941 0.693 0.723 0.736
## 7 CNAG_02703 CNK01910 0.898 0.696 0.687 0.884
## 8 CNAG_00529 CNA05110 0.852 0.817 0.644 0.833
## 9 CNAG_03410 CNG01740 0.914 0.661 0.717 0.963
## 10 CNAG_06088 CNM00950 0.943 0.822 0.751 0.971
## # ... with 107 more rows
Many of these have the expected structure where homologs differ only at the N-terminus. There appears to be a swap between a near-ATG start codon, and a poor-context ATG, between the species.
Higher aAUG score in JEC21:
Higher aAUG score in H99:
These look like mostly misannotated in one strain, or not interesting. Is the upstream start codon in one strain actually used? Check for ribosome footprints and for other features (homology, mito localization seq). It would be nice to have an additional filter here.
We looked if genes with different predicted mito localization in the two strains could have swapped poor ATG for near-ATG start codons. However we did not find good evidence for that. Maybe if talternative TSS’s are used.
Filtered for enough RNA (top 50%).
## # A tibble: 28 x 4
## H99 JEC21 Prob_preseq.H99 Prob_preseq.JEC21
## <chr> <chr> <dbl> <dbl>
## 1 CNAG_07163 CNE02790 1 0
## 2 CNAG_04443 CNI01140 0.999 0
## 3 CNAG_02354 CNE01960 0.997 0.064
## 4 CNAG_05511 CNH01640 0.995 0
## 5 CNAG_04664 CNJ00980 0.984 0.056
## 6 CNAG_00427 CNA04120 0.979 0.063
## 7 CNAG_01522 CNC06600 0.966 0
## 8 CNAG_07352 CNA03190 0.916 0.18
## 9 CNAG_05058 CNL05610 0.859 0.113
## 10 CNAG_01145 CND02700 0.841 0.34
## # ... with 18 more rows
## # A tibble: 29 x 4
## H99 JEC21 Prob_preseq.H99 Prob_preseq.JEC21
## <chr> <chr> <dbl> <dbl>
## 1 CNAG_00304 CNA02885 0 0.981
## 2 CNAG_02033 CNE05020 0.174 0.955
## 3 CNAG_03649 CNB01500 0.228 0.733
## 4 CNAG_03403 CNG01800 0.302 0.729
## 5 CNAG_02794 CNC04130 0.005 0.68
## 6 CNAG_06328 CNN00560 0.349 0.661
## 7 CNAG_04539 CNI00150 0.371 0.655
## 8 CNAG_03540 CNG00570 0.292 0.65
## 9 CNAG_03984 CNB04715 0.367 0.615
## 10 CNAG_04801 CNJ02280 0.316 0.601
## # ... with 19 more rows
There seem to be more with mito-gain in H99. OR with alternative non-ATG starts in JEC21.
CNAG_01145, australin/borealin related chromosome passenger protein
CNAG_06328, hypothetical protein, some fungal homologs, alpha/beta barrels
This was done on 25th June, with values generated by CryptoATGcontext then. Not a reproducible analysis here!
I performed GO analysis with PANTHER.db on JEC21 gene names. PANTHER version 13.1 Released 2018-02-03, Overrepresentation test on GOslim terms.
Link: http://www.pantherdb.org/tools/compareToRefList.jsp
File dvsaATG_highdiffn_outframe_cc.txt.
No significant GO terms.
File dvsaATG_highdiffn_inframe_cc.txt.
Enriched in Biological processes:
Molecular Function:
Cellular Component:
File hiTrans_cc.txt.
Enriched BPs include:
Enriched MFs include:
Enriched CCs include:
File hiTE_cc.txt.
Enriched BPs include:
Enriched MFs include:
Enriched CCs include:
File loTE_cc.txt.
Enriched BPs include:
Enriched MFs, no sig. results.
Enriched CCs, no sig. results.